Cancer Epidemiology, Biomarkers & Prevention — Latest Matching Preprints

1

Prioritizing context-specific genetic risk mechanisms in 11 solid cancers

Wu, X.; Kim, A.; Breeze, C. E.; O'Mara, T. A.; Ramachandran, D.; Dork, T.; Koutros, S.; Rothman, N.; Prokunina-Olsson, L.; Mancuso, N.; Lindstroem, S.; Kraft, P.

2025-11-02 epidemiology 10.1101/2025.10.30.25339145 medRxiv

Top 0.1%

40.2%

Show abstract

BackgroundWhile genome-wide association studies (GWAS) have identified hundreds of cancer-associated genetic variants, the specific biological contexts where these variants exert their effects remain largely unknown. We aimed to prioritize context-specific genetic risk mechanisms for 11 solid cancers at both genome-wide and single-variant resolutions. MethodsWe integrated cancer GWAS summary statistics from European ancestry samples (avg. n cases=47,856) with [~]1,500 context-specific annotations representing candidate cis-regulatory elements. For genome-wide analysis, we applied CT-FM, a method that leverages heritability enrichment estimates and an annotation correlation matrix to select likely disease-relevant biological contexts. After identifying putative causal SNPs (PIP[≥]0.5) via functionally informed fine-mapping, we used CT-FM-SNP to identify relevant contexts for individual variants. A combined SNP-to-gene framework was applied to construct putative {regulatory SNP-context-gene-cancer} quadruplets. ResultsStratified LD score regression analysis identified 52 annotations with significant heritability enrichment (Bonferroni-corrected P[≤]0.05). CT-FM prioritized four high-confidence (PIP[≥]0.5) biological contexts: mammary luminal epithelial cells for breast cancer, a prostate cancer epithelial cell line (VCaP) for prostate cancer, and bulk tumor tissue contexts for colorectal and renal cancers. Variant-level analysis of hundreds of putatively causal SNPs corroborated these findings and identified additional high-confidence contexts for other malignancies, including estrogen receptor-negative breast cancer and bladder cancer. A total of 489 putative regulatory quadruplets were constructed, proposing specific molecular mechanisms underlying the observed GWAS signals. ConclusionThese findings advance our understanding of genetic susceptibility to different cancers. Future work in larger, more diverse GWAS, coupled with more comprehensive annotation atlases, is essential to expand upon and validate our results.

2

Gene-Specific Cancer Patterns in Pathogenic Germline Variant Carriers

Idumah, G.; Ribaudo, I.; Newell, D.; Ni, Y.; Arbesman, J.

2026-01-30 oncology 10.64898/2026.01.27.26344970 medRxiv

Top 0.1%

40.0%

Show abstract

BackgroundWe previously reported that >5% of the population carries pathogenic or likely pathogenic variants (P/LPVs) in key cancer susceptibility genes. However, gene-specific cancer prevalence, spectrum, burden, lifetime risk, comorbidity, and the risk associated with autosomal recessive (AR) genes among carriers remain incompletely defined. MethodsWe analyzed 72 cancer susceptibility genes in the All of Us dataset (N=633,547), including 287,076 participants with both genomic and electronic health record data. Cancer diagnoses were identified using SNOMED codes and grouped into 35 categories. Associations between P/LPVs and overall and site-specific cancer risk were evaluated using regression models adjusted for age, sex, race, and ethnicity. ResultsAmong genes with [≥]10 unique carriers, cancer prevalence was highest for MEN1 (80%), followed by TP53 (57.7%), MLH1 (48.4%), and MSH2 (47.2%). Carriers of P/LPVs in BRCA1, BRCA2, MLH1, APC, NF1, PTEN, and PALB2 had significantly earlier cancer diagnosis compared to non-carriers. Cancer prevalence was markedly higher in BRCA1 and BRCA2 carriers who are also mono-allelic MUTYH carriers (75% and 45.5%, respectively) compared with BRCA1 and BRCA2 alone (43.2% and 36.5%). Adjusted survival analysis showed increased cancer risk for MLH1 (OR=6.08), PTEN (OR=5.80), and MSH2 (OR=5.19). Novel associations included MITF with anal/perianal and prostate cancer; BLM with ovarian and soft tissue/sarcoma; WRN with gynecologic cancer (NOS); and FH with hematologic malignancy. ConclusionsThis population-based analysis defines gene-specific cancer prevalence, spectrum, and risk, including contributions from AR variants, in the U.S. population. These findings support more precise genetic testing, screening, and risk stratification for individuals carrying inherited P/LPVs.

3

Transcriptome-wide Mendelian randomisation exploring dynamic CD4+ T cell gene expression in colorectal cancer development

Deslandes, B.; Wu, X.; Lee, M. A.; Goudswaard, L. J.; Jones, G. W.; Gsur, A.; Lindblom, A.; Ogino, S.; Vymetalkova, V.; Wolk, A.; Wu, A. H.; Huyghe, J. R.; Peters, U.; Phipps, A. I.; Thomas, C. E.; Pai, R. K.; Grant, R. C.; Buchanan, D. D.; Yarmolinksy, J.; Gunter, M. J.; Zheng, J.; Hazelwood, E.; Vincent, E. E.

2025-04-17 epidemiology 10.1101/2025.04.15.25325863 medRxiv

Top 0.1%

39.6%

Show abstract

BackgroundRecent research has identified a potential protective effect of higher numbers of circulating lymphocytes on colorectal cancer (CRC) development. However, the importance of different lymphocyte subtypes and activation states in CRC development and the biological pathways driving this relationship remain poorly understood and warrant further investigation. Specifically, CD4+ T cells - a highly dynamic lymphocyte subtype - undergo remodelling upon activation to induce the expression of genes critical for their effector function. Previous studies investigating their role in CRC risk have used bulk tissue, limiting our current understanding of the role of these cells to static, non-dynamic relationships only. MethodsHere, we combined two genetic epidemiological methods - Mendelian randomisation (MR) and genetic colocalisation - to evaluate evidence for causal relationships of gene expression on CRC risk across multiple CD4+ T cell subtypes and activation stage. Genetic proxies were obtained from single-cell transcriptomic data, allowing us to investigate the causal effect of expression of 1,805 genes across five CD4+ T cell activation states on CRC risk (78,473 cases; 107,143 controls). We repeated analyses stratified by CRC anatomical subsites and sex, and performed a sensitivity analysis to evaluate whether the observed effect estimates were likely to be CD4+ T cell-specific. ResultsWe identified six genes with evidence (FDR-P<0.05 in MR analyses and H4>0.8 in genetic colocalisation analyses) for a causal role of CD4+ T cell expression in CRC development - FADS2, FHL3, HLA-DRB1, HLA-DRB5, RPL28, and TMEM258. We observed differences in causal estimates of gene expression on CRC risk across different CD4+ T cell subtypes and activation timepoints, as well as CRC anatomical subsites and sex. However, our sensitivity analysis revealed that the genetic proxies used to instrument gene expression in CD4+ T cells also act as eQTLs in other tissues, highlighting the challenges of using genetic proxies to instrument tissue-specific expression changes. ConclusionsOur study demonstrates the importance of capturing the dynamic nature of CD4+ T cells in understanding disease risk, and prioritises genes for further investigation in cancer prevention research.

4

DNA Methylation-Derived Immune Cell Proportions and Cancer Risk, Including Lung Cancer, in Black Participants

Semancik, C.; Zhao, N.; Koestler, D.; Boerwinkle, E.; Bressler, J.; Buchsbaum, R.; Kelsey, K. T.; Platz, E. A.; Michaud, D.

2024-05-09 epidemiology 10.1101/2024.05.09.24307118 medRxiv

Top 0.1%

39.2%

Show abstract

Prior cohort studies assessing cancer risk based on immune cell subtype profiles have predominantly focused on White populations. This limitation obscures vital insights into how cancer risk varies across race. Immune cell subtype proportions were estimated using deconvolution based on leukocyte DNA methylation markers from blood samples collected at baseline on participants without cancer in the Atherosclerosis Risk in Communities (ARIC) Study. Over a mean of 17.5 years of follow-up, 668 incident cancers were diagnosed in 2,467 Black participants. Cox proportional hazards regression was used to examine immune cell subtype proportions and overall cancer incidence and site-specific incidence (lung, breast, and prostate cancers). Higher T regulatory cell proportions were associated with statistically significantly higher lung cancer risk (hazard ratio = 1.22, 95% confidence interval = 1.06-1.41 per percent increase). Increased memory B cell proportions were associated with significantly higher risk of prostate cancer (1.17, 1.04-1.33) and all cancers (1.13, 1.05-1.22). Increased CD8+ naive cell proportions were associated with significantly lower risk of all cancers in participants [≥]55 years (0.91, 0.83-0.98). Other immune cell subtypes did not display statistically significant associations with cancer risk. These results in Black participants align closely with prior findings in largely White populations. Findings from this study could help identify those at high cancer risk and outline risk stratifying to target patients for cancer screening, prevention, and other interventions. Further studies should assess these relationships in other cancer types, better elucidate the interplay of B cells in cancer risk, and identify biomarkers for personalized risk stratification.

5

Association of Genetic Ancestry with Molecular Tumor Profiles in Colorectal Cancer

Rhead, B.; Hein, D.; Pouliot, Y.; Guinney, J.; De La Vega, F. M.; Sanford, N.

2023-07-12 oncology 10.1101/2023.07.12.23292571 medRxiv

Top 0.1%

38.2%

Show abstract

BackgroundPrior research on molecular correlates of disparities in incidence and outcomes of colorectal cancer (CRC) have typically used self-reported or observed categories of race and ethnicity, which can be missing or inaccurate. Furthermore, race and ethnicity do not always capture genetic similarity well, particularly in admixed populations. To overcome these limitations, we examined associations of CRC tumor molecular profiles using genetic ancestry. MethodsSequencing was performed with the Tempus xT NGS 648-gene panel and whole exome capture RNA-Seq for 8,454 CRC patients. Genetic ancestry proportions were estimated for five continental groups, Africa (AFR), Americas (AMR), East Asia (EAS), Europe (EUR), and South Asia (SAS), using ancestry informative markers. We assessed association of genetic ancestry proportions and genetic ancestry-imputed race and ethnicity categories with somatic mutations in relevant CRC genes and in expression profiles, including consensus molecular subtypes (CMS). ResultsIncreased AFR ancestry was associated with higher odds of somatic mutations in APC, KRAS and PIK3CA and lower odds of BRAF mutations. Additionally, increased EAS ancestry was associated with lower odds of mutations in KRAS, EUR with higher odds in BRAF, and the Hispanic/Latino category with lower odds in BRAF. Greater AFR ancestry and the non-Hispanic Black category were associated with higher rates of CMS3, while patients in the Hispanic/Latino category had more indeterminate CMS. ConclusionsUse of genetic ancestry enables identification of molecular differences in CRC tumor mutation frequencies and gene expression that may underlie observed differences by race and ethnicity, and suggests that subtype classifications such as CMS may benefit from greater patient diversity.

6

The Multiethnic Cohort: A Resource for the study of Genetic and non-Genetic Cancer Risk Across Populations

Bogumil, D.; Sheng, X.; Wan, P.; Xia, L.; Pooler, L.; Cheng, I.; Streicher, S.; Huang, B. Z.; Chen, F.; Stram, D.; Shen, S.; King, G.; Chiang, C. W. K.; Ongaco, C.; Adams, M.; McMullen, I.; Zhang, P.; Ling, H.; Mawhinney, M.; Doheny, K. F.; Le Marchand, L.; Wilkens, L. R.; Haiman, C. A.; Conti, D. V.

2025-06-11 epidemiology 10.1101/2025.06.09.25328993 medRxiv

Top 0.1%

34.0%

Show abstract

IntroductionThe Multiethnic Cohort Study (MEC) is a U.S. prospective cohort of over 215,000 participants, designed to investigate variation in risk factors and disease across diverse racial and ethnic groups. Over 74,000 participants contributed biospecimens for genetic studies. We describe this sub-cohort and demonstrate the types of analyses it enables. MethodsThe MEC recruited adults aged 45-75 in California and Hawaii between 1993 and 1996. Cancer diagnoses were identified via state tumor registries. The MEC Genetics Database includes 73,139 participants with germline genotype data. We evaluated genetic similarity, its relationship with self-reported race/ethnicity, and baseline characteristics, including neighborhood socioeconomic status. Using breast, colorectal, and prostate cancer as examples, the database supports multi-ancestry genome-wide association studies (GWAS), evaluation of non-genetic factors, and time-to-event analyses. ResultsParticipants included 10,962 African Americans, 24,234 Japanese Americans, 17,242 Latinos, 5,488 Native Hawaiians, 14,649 Whites, and 564 other. Principal component analysis revealed substantial diversity in ancestry. Multiethnic GWAS demonstrated effective control of population stratification while replicating many previously discovered variants. Polygenic risk score (PRS) effects varied by racial and ethnic group. Time-to-event analysis showed associations between cancer incidence and neighborhood socioeconomic status, population descriptors, and genetic similarity. DiscussionThe MEC Genetics Database enables comprehensive assessment of genetic and non-genetic cancer risk, revealing differences in absolute risk by race and ethnicity. Studying both types of risk factors in diverse and admixed populations is critical for improving risk characterization and reducing disparities. This resource supports future research in polygenic traits, gene-environment interactions, and integrated risk prediction.

7

Protein-truncating and rare missense variants in ATM and CHEK2 and associations with cancer in UK Biobank whole-exome sequenced data

Mukhtar, T.; Wilcox, N. A.; Dennis, J.; Yang, X.; Naven, M.; Mavaddat, N.; Perry, J.; Gardner, E.; Easton, D.

2024-07-03 epidemiology 10.1101/2024.07.01.24309756 medRxiv

Top 0.1%

33.6%

Show abstract

BackgroundDeleterious germline variants in ATM and CHEK2 have been associated with a moderately increased risk of breast cancer. Risks for other cancers remain unclear, and require further investigation. MethodsCancer associations for coding variants in ATM and CHEK2 were evaluated using whole-exome sequenced data from UK Biobank linked to cancer registration data (348,488 participants), and analysed both as a retrospective case-control and a prospective cohort study. Odds ratios, hazard ratios, and combined relative risks (RRs) were estimated by cancer type and gene. Separate analyses were performed for protein-truncating variants (PTVs) and rare missense variants (rMSVs; allele frequency <0{middle dot}1%). ResultsPTVs in ATM were associated with increased risks of nine cancers at p<0{middle dot}001 (pancreas, oesophagus, lung, melanoma, breast, ovary, prostate, bladder, lymphoid leukaemia [LL]), and two at p<0{middle dot}05 (colon, diffuse non-Hodgkins lymphoma [DNHL]). Carriers of rMSVs had increased risks of four cancers (p<0{middle dot}05: stomach, pancreas, prostate, Hodgkins disease [HD]). RRs were highest for breast, prostate, and any cancer where rMSVs lay in the FAT or PIK domains, and had a CADD score in the highest quintile. PTVs in CHEK2 were associated with three cancers at p<0{middle dot}001 (breast, prostate, HD), and six at p<0{middle dot}05 (oesophagus, melanoma, ovary, kidney, DNHL, myeloid leukaemia). Carriers of rMSVs had increased risks of five cancers (p<0{middle dot}001: breast, prostate, LL; p<0{middle dot}05: melanoma, multiple myeloma). ConclusionPTVs in ATM and CHEK2 are associated with a wide range of cancers, with the highest RR for pancreatic cancer in ATM PTV carriers. These findings can inform genetic counselling of carriers. WHAT IS ALREADY KNOWN ON THIS TOPICO_LIWhile previous research shows there is evidence for association between variants in ATM or CHEK2 and multiple cancer types in individual smaller studies, the associations have not been consistently evaluated across all cancer types and, with the exception of breast cancer, the strengths of association are unclear. C_LI WHAT THIS STUDY ADDSO_LIWe examined data from a large cohort study to derive relative and absolute risks for all cancer types for carriers of PTVs and rMSVs in CHEK2 and ATM . C_LIO_LIATM PTVs were associated with significantly increased risk for 11 of 23 sites examined (nine at p<0{middle dot}001), with the relative risk being highest for pancreatic cancer (approximately seven-fold). Carriers of rMSVs had increased risks of four cancers, with a RR of approximately 1{middle dot}5. C_LIO_LIFor CHEK2 PTVs, statistically significant risks were observed for seven of the 21 sites examined (one at p<0{middle dot}001). Carriers of rMSVs had increased risks of five cancers with the risk being highest for lymphoid leukaemia (approximately two-fold). C_LI HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICYO_LIATM and CHEK2 are included on many cancer gene panels used in family cancer clinics, and the risk estimates from these analyses can inform genetic counselling for carriers. C_LIO_LIThe estimated absolute risks for pancreatic cancer in ATM PTV carriers (11% in males and 8% in females by age 85) are notably higher than for other major pancreatic susceptibility genes including BRCA2, CDK2NA, and PALB2. Our findings can also inform NICE guidelines for pancreatic cancer, which do not currently include ATM . C_LI

8

Redefining the Bladder Cancer Phenotype using Patterns of Familial Risk

Hanson, H. A.; Leiser, C. L.; Martin, C.; Gupta, S.; Smith, K. R.; Dechet, C.; Lowrance, W.; O'Neil, B.; Camp, N. J.

2019-08-16 epidemiology 10.1101/19003681 medRxiv

Top 0.1%

33.3%

Show abstract

Relatives of bladder cancer (BCa) patients have been shown to be at increased risk for kidney, lung, thyroid, and cervical cancer after correcting for smoking related behaviors that may concentrate in some families. We demonstrate a new method to simultaneously assess risks for multiple cancers to identify distinct multi-cancer configurations (multiple different cancer types that cluster in relatives) surrounding BCa patients. We identified 6,416 individuals with urothelial carcinoma and familial information using the Utah Cancer Registry and Utah Population Database (UPDB). First-degree relatives, second-degree relatives, and first cousins were used to construct a familial enrichment matrix for cancer-types previously shown to be individually associated with BCa. K-medioids clustering were used to identify Familial Multi-Cancer Configurations (FMC). A case-control design and Cox regression with a 1:5 ratio of BCa cases to cancer-free controls was used to quantify the risk in specific relative-types and spouses in each FMC. Clustering analysis revealed 12 distinct FMCs, each exhibiting a different pattern of cancer co-aggregation. Of the 12 FMCs, four exhibited strong familial risk of bladder cancer along with specific patterns of increased risk of cancers in other sites (BCa FMCs), and were the focus of further investigation. Cancers at increased risk in these four BCa FMCs most commonly included melanoma, prostate and breast cancer and less commonly included leukemia, lung, pancreas and kidney cancer. A network-based approach can be used with familial data to discover new phenotype clusters for BCa, providing new directions for discovering patterns of cancer clustering.

9

Cohort studies on melanoma and keratinocyte skin cancer: a systematic review

Olsen, C.; Whiteman, D. C.; Neale, R. E.

2025-08-15 epidemiology 10.1101/2025.08.13.25332927 medRxiv

Top 0.1%

32.6%

Show abstract

The incidence of cutaneous malignancies is increasing worldwide, presenting an important public health burden. Cohort studies can provide high quality data on the epidemiology of these cancers, and are invaluable for deriving measures of disease burden used to inform prevention, diagnosis and treatment. We conducted a systematic review of the literature to summarise the characteristics of cohort studies that have published one or more papers describing the epidemiology of melanoma and/or keratinocyte cancers. Eligible studies were population-based cohort studies that have published findings on incidence or etiology of melanoma or keratinocyte cancer (including associations with phenotypic, environmental, and genetic factors). We excluded clinical cohorts focused on survivorship outcomes. We searched MEDLINE 1950 (U.S. National Library of Medicine, Bethesda, MD, USA), the ISI Science Citation Index (1990 to 31 July 2025) and the reference lists of retrieved articles, imposing no language restrictions. We identified 22 eligible cohort studies, 20 of which had published on melanoma, and 16 on keratinocyte cancer. Nine were conducted in the United States, eleven in Europe, and two in Australia. There was substantial variability in terms of cohort size, risk factor information recorded at baseline, and other data collected (e.g., health services, genetic). Only three studies were specifically designed to examine skin cancers as study endpoints, and only two cohorts pre-specified both melanoma and keratinocyte cancer endpoints. Our summary provides a resource for skin cancer researchers conducting investigations into the causes, burden and prevention of these important cancers.

10

VOYAGER: an international consortium investigating the role of human papilloma virus and genetics in oral and oropharyngeal cancer risk and survival

Gormley, M.; Adhikari, A.; Dudding, T.; Pring, M.; Hurley, K.; Macfarlane, G. J.; Lagiou, P.; Lagiou, A.; Polesel, J.; Agudo, A.; Alemany, L.; Ahrens, W.; Healy, C. M.; Conway, D. I.; Canova, C.; Holcatova, I.; Richiardi, L.; Znaor, A.; Olshan, A. F.; Hung, R. J.; Liu, G.; Bratman, S.; Zhao, X.; Holt, J.; Cortez, R.; Gaborieau, V.; McKay, J. D.; Brennan, P.; Waterboer, T.; Hayes, N.; Diergaarde, B.; Virani, S.

2025-02-21 epidemiology 10.1101/2025.02.17.25322399 medRxiv

Top 0.1%

28.3%

Show abstract

Head and neck cancer (HNC) is the sixth most common cancer globally. Incidence and survival rates vary significantly across geographic regions and tumor subsites. This is partly due to differences in risk factor exposure, which includes tobacco smoking, alcohol consumption and human papillomavirus (HPV) infection, alongside detection and treatment strategies. The VOYAGER (human papillomaVirus, Oral and oropharYngeal cAncer GEnomic Research) consortium is a collaboration between five large North American and European studies which generated data on 10,530 participants (7,233 cases and 3,297 controls). The primary goal of the collaboration was to improve understanding of the role of HPV and genetic factors in oral cavity and oropharyngeal cancer risk and outcome. Demographic and clinical data collected by the five studies were harmonized, and HPV status was determined for the majority of cases. In addition, 999 tumors were sequenced to define somatic mutations. These activities generated a comprehensive biomedical resource that can be utilized to answer critical outstanding research questions to help improve HNC prevention, early detection, treatment, and surveillance.

11

Epigenetic field cancerization in breast cancer using subject-matched tumor, ipsilateral-normal, and contralateral-normal tissues

Muse, M. E.; Titus, A. J.; Salas, L. A.; Wilkins, O. M.; Mullen, C.; Gregory, K. J.; Schneider, S. S.; Crisi, G. M.; Jarwale, R. M.; Otis, C. N.; Christensen, B. C.; Arcaro, K. F.

2019-07-12 epidemiology 10.1101/19002014 medRxiv

Top 0.1%

27.0%

Show abstract

BackgroundEmerging work has demonstrated that histologically normal (non-tumor) tissue adjacent to breast tumor tissue shows evidence of molecular alterations related to tumorigenesis, referred to as field cancerization effects. Although changes in DNA methylation are known to occur early in breast carcinogenesis and the landscape of breast tumor DNA methylation is profoundly altered compared with normal tissue, there have been limited efforts to identify DNA methylation field cancerization effects in histologically normal breast tissue adjacent to tumor. MethodsMatched tumor, histologically normal tissue of the ipsilateral breast (ipsilateral-normal), and histologically normal tissue of the contralateral breast (contralateral-normal) were obtained from nine women undergoing bilateral mastectomy. Laser capture microdissection was used to select breast epithelial cells from normal tissues, and neoplastic cells from tumor specimens for genome-scale measures of DNA methylation with the Illumina HumanMethylationEPIC array. ResultsWe identified substantially more CpG loci that were differentially methylated between contralateral-normal breast and tumor tissue (63,271 CpG loci q < 0.01), than between ipsilateral-normal tissue and tumor (38,346 CpG loci q < 0.01). In addition, we identified differential methylation in ipsilateral-normal relative to contralateral-normal tissue (9,562 CpG loci p < 0.01). Hypomethylated loci in ipsilateral normal relative to contralateral were significantly enriched for breast cancer-relevant transcription factor binding sites including those for ESR1, FoxA1, and GATA3. Hypermethylated loci in ipsilateral-normal relative to contralateral-normal tissue were significantly enriched for CpG island shore regions. ConclusionsOur results indicate that early hypermethylation events in breast carcinogenesis are more likely to occur in the regions immediately surrounding CpG islands than CpG islands per se, reflecting a field effect of the tumor on surrounding histologically normal tissue. This work offers an opportunity to focus investigations of early DNA methylation alterations in breast carcinogenesis and potentially develop epigenetic biomarkers of disease risk.

12

Genetically-proxied anti-diabetic drug target perturbation and risk of cancer: a Mendelian randomization analysis

Yarmolinsky, J.; Bouras, E.; Constantinescu, A.-E.; Burrows, K.; Bull, C. J.; Vincent, E. E.; Martin, R.; Dimopoulou, O.; Lewis, S. J.; Moreno, V. J.; Vujkovic, M.; Chang, K.-M.; Voight, B. F.; Tsao, P. S.; Gunter, M. J.; Hampe, J.; Lindblom, A.; Pellatt, A. J.; Pharoah, P.; Schoen, R. E.; Gallinger, S.; Jenkins, M. A.; Pai, R. K.; the PRACTICAL consortium, ; VA Million Veteran Program, ; Gill, D.; Tsilidis, K. K.

2022-10-26 epidemiology 10.1101/2022.10.24.22281370 medRxiv

Top 0.1%

23.3%

Show abstract

Aims/hypothesisEpidemiological studies have generated conflicting findings on the relationship between anti-diabetic medication use and cancer risk. Naturally occurring variation in genes encoding anti-diabetic drug targets can be used to investigate the effect of their pharmacological perturbation on cancer risk. MethodsWe developed genetic instruments for three anti-diabetic drug targets (peroxisome proliferator activated receptor gamma, PPARG; sulfonylurea receptor 1, ABCC8; glucagon-like peptide 1 receptor, GLP1R) using summary genetic association data from a genome-wide association study (GWAS) of type 2 diabetes in 69,869 cases and 127,197 controls in the Million Veteran Program. Genetic instruments were constructed using cis-acting genome-wide significant (P<5x10-8) single-nucleotide polymorphisms (SNPs) permitted to be in weak linkage disequilibrium (r2<0.20). Summary genetic association estimates for these SNPs were obtained from GWAS consortia for the following cancers: breast (122,977 cases, 105,974 controls), colorectal (58,221 cases, 67,694 controls), prostate (79,148 cases, 61,106 controls), and overall (i.e. site-combined) cancer (27,483 cases, 372,016 controls). Inverse-variance weighted random-effects models adjusting for linkage disequilibrium were employed to estimate causal associations between genetically-proxied drug target perturbation and cancer risk. Colocalisation analysis was employed to examine robustness of findings to violations of Mendelian randomization (MR) assumptions. A Bonferroni correction was employed as a heuristic to define associations from MR analyses as "strong" and "weak" evidence. ResultsIn Mendelian randomization analysis, genetically-proxied PPARG perturbation was weakly associated with higher risk of prostate cancer (OR for PPARG perturbation equivalent to a 1 unit decrease in inverse-rank normal transformed HbA1c: 1.75, 95% CI 1.07-2.85, P=0.02). In histological subtype-stratified analyses, genetically-proxied PPARG perturbation was weakly associated with lower risk of ER+ breast cancer (OR 0.57, 95% CI 0.38-0.85; P=6.45 x 10-3). In colocalisation analysis however, there was little evidence of shared causal variants for type 2 diabetes liability and cancer endpoints in the PPARG locus, though these analyses were likely underpowered. There was little evidence to support associations of genetically-proxied PPARG perturbation with colorectal or overall cancer risk or genetically-proxied ABCC8 or GLP1R perturbation with risk across cancer endpoints. Conclusions/interpretationOur drug-target MR analyses did not find consistent evidence to support an association of genetically-proxied PPARG, ABCC8 or GLP1R perturbation with breast, colorectal, prostate or overall cancer risk. Further evaluation of these drug targets using alternative molecular epidemiological approaches may help to further corroborate the findings presented in this analysis. Research in contextO_LIWhat is already known about this subject? O_LIAnti-diabetic medication use is variably linked to both increased and decreased cancer risk in conventional epidemiological studies C_LIO_LIIt is unclear whether these associations represent causal relationships C_LI C_LIO_LIWhat is the key question? O_LIWhat is the association of genetically-proxied perturbation of three anti-diabetic drug targets (PPARG, ABCC8, GLP1R) with risk of breast, colorectal, prostate and overall cancer risk? C_LI C_LIO_LIWhat are the new findings? O_LIGenetically-proxied PPARG perturbation was weakly associated with higher risk of prostate cancer and lower risk of ER+ breast cancer C_LIO_LIThere was little evidence that liability to type 2 diabetes and these cancer endpoints shared one or more causal variants in the PPARG locus, a necessary precondition to infer causality between PPARG perturbation and cancer risk C_LI C_LIO_LIHow might this impact on clinical practice in the foreseeable future? O_LIOur drug-target Mendelian randomization analyses did not find consistent evidence to support a link between genetically-proxied perturbation of PPARG, ABCC8, and GLP1R and risk of breast, colorectal, prostate and overall cancer risk C_LIO_LIThese findings suggest that on-target effects of PPARG agonists, sulfonylureas, and GLP1R agonists are unlikely to confer large effects on breast, colorectal, prostate, or overall cancer risk C_LI C_LI

13

Disparities in US public historic cancer mortality data: Advocacy for gastrointestinal cancers, tailored prevention measures, and the inclusion of oversea deaths. -Under-reporting of cancer deaths as sign of health disparity

Zhang, T.; Valle, J.; Patel, A. A.; Lindsey, S.; Smith, A.; Hung, T. K. W.

2025-01-23 epidemiology 10.1101/2025.01.22.25320955 medRxiv

Top 0.1%

23.2%

Show abstract

Historically, data from US death certificates, available through the CDCs WONDER database, have been used to highlight health disparities. The 5 year (2018-2022) underlying-cause-of-death data were analyzed for different race/ethnicity groups. Gastrointestinal (GI) cancers of the colon/rectum, pancreas, liver/bile duct, and stomach contributed to more than a quarter of cancer deaths in the US, calling for focused advocacy. Known disparities for Black non-Hispanic Americans were verified in cancers of the colon/rectum, and pancreas. However, mortality data for "More than one race" non-Hispanic group or non-White Hispanic group appeared unreliable, suggesting that under-reporting is also a sign of health disparity. Age-specific death rates (ASDRs) were calculated to view health disparities in various age groups. Cancers of the colon/rectum, liver, and stomach cause significant mortality in the under-50 population. And minority groups are more likely to die from cancers in the liver or the stomach compared to the White non-Hispanics. Liver cancer crude death rate was lower than expect in Asian Non-Hipanics when compared with the high mortalities of IARCs GLOBOCAN estimates for Asian countries. Census data were then used to calculate Asian sub-groups ASDRs. Significantly higher risks were seen in gastric cancer for Korean Americans and liver cancer for Vietnamese Americans. The Asian Indian group had the lowest death rates across several GI cancers, even in gallbladder cancer. Some immigrants go back to their birth countries at end of life and these deaths are not reflected in WONDER because consulate reports of American citizens deaths abroad do not include race/ethnicity data. SIGNIFICANCEGastrointestinal cancers should be advocacy focuses to promote cancer equity. Under-reporting of deaths for minority groups is a sign of health disparity. ASDRs (Age Specific Death Rates) provide bases for tailored screening and prevention. Exclusion of American citizens deaths abroad in U.S. public health datasets may mask care access issues faced by immigrant cancer patients who may die overseas. Updating how these deaths are reported will help address this issue.

14

Exploring the causal role of the human gut microbiome in colorectal cancer: Application of Mendelian randomization

Hatcher, C.; Richenberg, G.; Waterson, S.; Nguyen, L. H.; Joshi, A. D.; Carreras-Torres, R.; Moreno, V.; Chan, A. T.; Gunter, M.; Lin, Y.; Qu, C.; Song, M.; Casey, G.; Figueiredo, J. C.; Gruber, S. B.; Hampe, J.; Hampel, H.; Jenkins, M. A.; Keku, T. O.; Peters, U.; Tangen, C. M.; Wu, A. H.; Hughes, D. A.; Ruhlemann, M. C.; Raes, J.; Timpson, N. J.; Wade, K. H.

2022-10-17 epidemiology 10.1101/2022.10.14.22281077 medRxiv

Top 0.1%

23.2%

Show abstract

AimThe role of the human gut microbiome in colorectal cancer (CRC) is unclear as most studies on the topic are unable to discern correlation from causation. We apply two-sample Mendelian randomization (MR) to estimate the causal relationship between the gut microbiome and CRC. Materials and methodsWe used summary-level data from independent genome-wide association studies to estimate the causal effect of 14 microbial traits (n=3,890 individuals) on overall CRC (55,168 cases, 65,160 controls) and site-specific CRC risk, conducting several sensitivity analyses to understand the nature of results. ResultsInitial MR analysis suggested that a higher abundance of Bifidobacterium and presence of an unclassified group of bacteria within the Bacteroidales order in the gut increased overall and site-specific CRC risk. However, sensitivity analyses suggested that instruments used to estimate relationships were likely complex and involved in many potential horizontal pleiotropic pathways, demonstrating that caution is needed when interpreting MR analyses with gut microbiome exposures. In assessing reverse causality, we did not find strong evidence that CRC causally affected these microbial traits. ConclusionsWhilst our study initially identified potential causal roles for two microbial traits in CRC, importantly, further exploration of these relationships highlighted that these were unlikely to reflect causality.

15

Using DEPendency of association on the number of Top Hits (DEPTH) as a complementary tool to identify novel risk loci in colorectal cancer

Lai, J.; Wong, C.; Schmidt, D. F.; Kapuscinski, M.; Alpen, K.; MacInnis, R. J.; Buchanan, D. D.; Win, A. K.; Figueiredo, J.; Chan, A. T.; Harrison, T. A.; Hoffmeister, M.; White, E.; Marchand, L. L.; Peters, U.; Hopper, J. L.; Makalic, E.; Jenkins, M. A.

2022-11-27 epidemiology 10.1101/2022.11.24.22282734 medRxiv

Top 0.1%

23.1%

Show abstract

BackgroundDEPendency of association on the number of Top Hits (DEPTH) is an approach to identify candidate risk regions by considering the risk signals from over-lapping groups of sequential variants across the genome. MethodsWe conducted a DEPTH analysis using a sliding window of 200 SNPs to colorectal cancer (CRC) data from the Colon Cancer Family Registry (CCFR) (5,735 cases and 3,688 controls), and GECCO (8,865 cases and 10,285 controls) studies. A DEPTH score >1 was used to identify risk regions common to both studies. We compared DEPTH results against those from conventional GWAS analyses of these two studies as well as against 132 published risk regions. ResultsInitial DEPTH analysis revealed 2,622 (CCFR) and 3,686 (GECCO) risk regions, of which 569 were common to both studies. Bootstrapping revealed 40 and 49 likely risk regions in the CCFR and GECCO data sets, respectively. Notably, DEPTH identified at least 82 likely risk regions that would not be detected using conventional GWAS methods, nor had they been identified in previous CRC GWASs. We found four reproducible risk regions (2q22.2, 2q33.1, 6p21.32, 13q14.3), with the HLA locus at 6p21 having the highest DEPTH score. The strongest associated SNPs were rs762216297, rs149490268, rs114741460, and rs199707618 for the CCFR data, and rs9270761 for the GECCO data. ConclusionDEPTH can identify novel likely risk regions for CRC not identified using conventional analyses of much larger datasets. ImpactDEPTH has potential as a powerful complementary tool to conventional GWAS analyses for identifying risk regions within the genome.

16

Risks of Subsequent Primary Extracolonic Cancers for Colorectal Cancer Survivors: A Study Protocol for the Development and Validation of a Risk Prediction Model

Aung, Y. K.; Jenkins, M.; Baxter, N. N.; Win, A. K.

2025-12-09 epidemiology 10.64898/2025.12.08.25341861 medRxiv

Top 0.1%

22.7%

Show abstract

BackgroundThe risk of developing a second cancer following colorectal cancer poses a significant challenge for colorectal cancer survivors, as 10-20% of survivors experience a subsequent primary cancer. These additional diagnoses are significantly associated with lower survival rates and increased morbidity compared with those who do not develop a subsequent primary cancer. Identifying survivors at the highest risk for subsequent primary cancers is imperative for preventive strategies and surveillance. This study protocol outlines the development and internal and external validation of risk prediction models, estimating individual risks of subsequent extracolonic cancer (cancers outside the colon and rectum) over 3-, 5-, 10-, and 15-year periods. MethodsThis study will include adult patients aged 18 years or older with a history of invasive colon, rectal, or colorectal cancer diagnosed within one year prior to recruitment; at least one year of follow-up post-recruitment; no prior nonskin cancers; and no genetic predispositions, such as Lynch syndrome. Data will be sourced from the Colon Cancer Family Registry Cohort, which recruits participants from population-based cancer registries. The primary outcome is the first diagnosis of a subsequent primary extracolonic cancer, excluding metastases or recurrence of any primary cancer. Candidate predictors will include sociodemographic factors, comorbidities, lifestyle factors, clinicopathological factors, and hormonal factors such as menopausal status and hormone use (in women). Initial predictor selection will be performed using five methods: backward stepwise elimination via flexible parametric regression, Cox regression, Fine and Gray subdistribution hazard regression, Lasso Cox regression, and elastic net regression. Predictive performance will be assessed through accuracy measures, discrimination (C and D statistics), R2 statistics, and calibration plots to select the best-performing final model. Internal validation will involve bootstrapping 500 samples to estimate optimism-corrected C statistics. We will calculate individualized risks for subsequent extracolonic cancer at 3-, 5-, 10-, and 15-year intervals using shrinkage-adjusted beta coefficients of the selected predictors in the final model, along with baseline hazards derived from two approaches--one that accounts for death as a competing risk, and one that does not. External validation will be performed by testing the final model on an independent cohort from the Melbourne Collaborative Cohort Study. DiscussionThis model may help clinicians identify colorectal cancer survivors at increased risk of subsequent extracolonic cancers, enabling early detection, lifestyle modifications, and personalized screening strategies. Early intervention could subsequently reduce morbidity and improve long-term outcomes for these colorectal cancer survivors.

17

Early Life-Course Patterns Of Registry-Defined Subsequent Cancers After HPV-Related Malignancies In A U.S. Population-Based Cohort

Torres Del Valle, J. M.; Amaya Ardila, C. P.; Malave Rivera, S. M.

2026-01-16 epidemiology 10.64898/2026.01.14.26344109 medRxiv

Top 0.1%

22.7%

Show abstract

BackgroundSubsequent primary malignancies following human papillomavirus (HPV)-related cancers represent an important survivorship concern. However, evidence remains limited regarding sociodemographic and clinical factors associated with registry-defined subsequent cancers among children, adolescents, and young adults in U.S. population-based cohorts. MethodsWe conducted a retrospective population-based analysis of 1,326 individuals diagnosed with HPV-related cancers using Surveillance, Epidemiology, and End Results (SEER) data. Registry-defined subsequent cancer was operationalized as the occurrence of additional primary HPV-related malignancies according to SEER multiple primary rules. Multivariable logistic regression models estimated associations with sex, age group, area-level socioeconomic status (Yost Index quintiles), persistent poverty census tract status, and primary cancer site. Sex-stratified analyses by cancer site were performed. ResultsRegistry-defined subsequent cancers were significantly associated with female sex and young adult age (20-29 years). Females had higher odds of subsequent cancer compared with males (OR = 1.06, 95% CI: 1.03-1.10), and individuals aged 20-29 years had higher odds than those aged 0-9 years (OR = 1.10, 95% CI: 1.05-1.16). Associations persisted after adjustment for socioeconomic indicators. No significant associations were observed with Yost Index quintiles or persistent poverty. Sex-stratified analyses showed higher odds of subsequent cancer for anal cancer among males and vulvar cancer among females relative to oropharyngeal cancer. ConclusionsSex and age are key determinants of registry-defined subsequent cancers following HPV-related malignancies, independent of area-level socioeconomic context. These findings support age- and sex-specific survivorship surveillance strategies across early life-course stages.

18

Investigating the Role of Neighborhood Socioeconomic Status and Germline Genetics on Prostate Cancer Risk

Judd, J.; Spence, J. P.; Pritchard, J. K.; Kachuri, L.; Witte, J. S.

2024-08-02 epidemiology 10.1101/2024.07.31.24311312 medRxiv

Top 0.1%

22.7%

Show abstract

BackgroundGenetic factors play an important role in prostate cancer (PCa) development with polygenic risk scores (PRS) predicting disease risk across genetic ancestries. However, there are few convincing modifiable factors for PCa and little is known about their potential interaction with genetic risk. We analyzed incident PCa cases (n=6,155) and controls (n=98,257) of European and African ancestry from the UK Biobank (UKB) cohort to evaluate the role of neighborhood socioeconomic status (nSES)-and how it may interact with PRS-on PCa risk. MethodsWe evaluated a multi-ancestry PCa PRS containing 269 genetic variants to understand the association of germline genetics with PCa in UKB. Using the English Indices of Deprivation, a set of validated metrics that quantify lack of resources within geographical areas, we performed logistic regression to investigate the main effects and interactions between nSES deprivation, PCa PRS, and PCa. ResultsThe PCa PRS was strongly associated with PCa (OR=2.04; 95%CI=2.00-2.09; P<0.001). Additionally, nSES deprivation indices were inversely associated with PCa: employment (OR=0.91; 95%CI=0.86-0.96; P<0.001), education (OR=0.94; 95%CI=0.83-0.98; P<0.001), health (OR=0.91; 95%CI=0.86-0.96; P<0.001), and income (OR=0.91; 95%CI=0.86-0.96; P<0.001). The PRS effects showed little heterogeneity across nSES deprivation indices, except for the Townsend Index (P=0.03). ConclusionsWe reaffirmed genetics as a risk factor for PCa and identified nSES deprivation domains that influence PCa detection and are potentially correlated with environmental exposures that are a risk factor for PCa. These findings also suggest that nSES and genetic risk factors for PCa act independently.

19

Characterizing cancer patterns in Okinawan vs. mainland Japanese Americans: The Multiethnic Cohort Study

Streicher, S. A.; Guillermo, C.; Park, S.; Chiang, C.; Shepherd, J.; Sheng, X.; Bogumil, D.; Park, S. L.; Cheng, I.; Lim, U.; Franke, A.; Stram, D.; Conti, D. V.; Haiman, C.; Wilkens, L.; Le Marchand, L.

2025-07-15 epidemiology 10.1101/2025.07.14.25331338 medRxiv

Top 0.1%

22.6%

Show abstract

Differences in cancer rates have been documented in Japan between Okinawa and mainland Japan. Limited data exist on whether these differences are also present for established populations of Okinawans and mainland Japanese in the United States. Dimensionality reduction techniques for genetic data combined with Okinawan surnames were used to identify Multiethnic Cohort Japanese American participants (N=24,484) of Okinawan or mainland descent. Cox proportional hazards models were used to compare cancer incidence between Okinawan and mainland Japanese participants. Geometric means were examined on a subset of MEC participants for circulating blood biomarker levels (N=2,980) and body composition (N=399). The Okinawan cluster included 3,649 individuals and the mainland cluster included 19,611 individuals. Okinawan individuals were more likely to have a higher average body mass index and shorter stature, better diet quality score, higher total energy intake, more alcohol consumption among drinkers, and a history of never smoking compared to mainland Japanese (all p-values<0.0001). In multivariable adjusted models, Okinawan women were more likely to be diagnosed with breast cancer (HR=1.36, 95% CI=1.07-1.73) and Okinawan men were less likely to be diagnosed with aggressive prostate cancer (HR=0.67, 95% CI=0.51-0.87) compared to their mainland Japanese counterparts. In subsets of MEC participants, adiponectin levels were lower, and C-reactive protein levels, visceral adipose tissue area (VAT) and the VAT-to- subcutaneous adipose tissue area ratio were higher, in Okinawans compared to mainland Japanese (all p-values<0.05). Results in this US-based sample are consistent with recent trends of higher breast and lower prostate cancer incidence rates in Okinawans reported from Japan. Novelty and impactCancer rate differences have been documented in Japan between Okinawa and mainland Japan; however, limited data exist on whether these differences are present in Japanese Americans of Okinawan or mainland descent. We report significant differences in breast cancer, prostate cancers, body composition, and obesity-related biomarkers for these two groups. Our findings suggest that these cancer risk disparities may not be solely due to lifestyle, but could be explained by body composition, genetics, or unmeasured factors.

20

Red meat intake interacts with a TGF-β-pathway-based polygenic risk score to impact colorectal cancer risk: Application of a novel approach for polygenic risk score construction.

Sanchez Mendez, J.; Queme, B.; Fu, Y.; Morrison, J.; Lewinger, J. P.; Kawaguchi, E.; Mi, H.; Obon-Santacana, M.; Moratalla-Navarro, F.; Martin, V.; Moreno, V.; Lin, Y.; Bien, S. A.; Qu, C.; Su, Y.-R.; White, E.; Harrison, T. A.; Huyghe, J. R.; Tangen, C. M.; Newcomb, P. A.; Phipps, A. I.; Thomas, C. E.; Conti, D. V.; Wang, J.; Platz, E. A.; Keku, T. O.; Newton, C. C.; Um, C. Y.; Kundaje, A.; Shcherbina, A.; Murphy, N.; Gunter, M. J.; Dimou, N.; Papadimitriou, N.; Bezieau, S.; van Duijnhoven, F. J.; Männistö, S.; Rennert, G.; Wolk, A.; Hoffmeister, M.; Brenner, H.; Chang-Claude, J.; Tian, Y.;

2025-06-16 epidemiology 10.1101/2025.06.13.25329599 medRxiv

Top 0.1%

22.5%

Show abstract

BackgroundRed and/or processed meat are established colorectal cancer (CRC) risk factors. Genome-wide association studies (GWAS) have reported over 200 variants associated with CRC risk. We used functional annotation data to identify subsets of variants within known pathways to construct pathway-based Polygenic Risk Scores (pPRS) to assess interactions with meat intake. MethodsA pooled sample of 30,812 cases and 40,504 CRC controls from 27 studies were analyzed. Quantiles for red and processed meat intake were constructed. 204 GWAS variants were annotated to genes with AnnoQ and assessed for overrepresentation in PANTHER-reported pathways. pPRSs were constructed from significantly overrepresented pathways. Covariate-adjusted logistic regression models evaluated interactions between pPRS and red or processed meat intake in relation to CRC risk. ResultsA total of 30 variants were overrepresented in four pathways: Presenilin-Alzheimer disease, Cadherin/WNT-signaling, Gonadotropin-releasing hormone receptor, and TGF-{beta} signaling. We found a significant interaction between TGF-{beta}-pPRS and red meat intake (ORint = 0.95; 95% CI = 0.92-0.98; p = 0.003). When variants in the TGF-{beta} pathway were assessed, we observed significant interactions of red meat with rs2337113 (intron SMAD7 gene, Chr18), and rs2208603 (intergenic region BMP5, Chr6) (p = 0.0005 & 0.036, respectively). There was no evidence of pPRS x red meat interactions for other pathways or with processed meat ConclusionsThis pathway-based interaction analysis revealed a statistically significant interaction between variants in the TGF-{beta} pathway and red meat consumption that impacts CRC risk. ImpactThese findings shed light into the possible mechanistic link between red meat consumption and CRC risk. Impact statementIn this work, we developed pathway-based Polygenic Risk Scores which, for the first time, suggested that red meat intake interacts with variants overrepresented in TGF-{beta} signaling pathway to impact colorectal cancer risk.